[70B-Part2] Improved save model (that can work with FSDP) #107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The fixes in this PR might not make much sense on their own, but here's what's changing:
no_split_modules is getting set dynamically. Apparently the previous approach (of including all possible classes) leads to an error since all classes are expected to be present.
state_dict
andload_state_dict
logic are slightly modified in how they're applied.a. using existing register_hook method instead
b. changing
save_pretrained
instead ofstate_dict
. TODO: This might end up fixing some of the warnings we were seeing and suppressing as well (not tested yet).train.py
reverts to usingtrainer.save_model
instead of pipeline (in order to work with FSDP), but we will still save the pipeline code and configs.